Natural language processing for similar languages, varieties, and dialects: A survey
نویسندگان
چکیده
منابع مشابه
A Resource for Natural Language Processing of Swiss German Dialects
Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLP applications for Swiss German. We extended the original corpus and improved its annotation consistency. Furthermore, we trained dialect-specific PoS-tagging models and implemented a baseline system for...
متن کاملDiscrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks
In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is l...
متن کاملNatural Language Processing - A Survey
1 "Computer, would you search the Web for references to our company and determine the 3 most common complaints mentioned about us?" "Sure, Kevin." "Then summarize and format your findings and insert it into the second page of my slide presentation." "No problem." "And also, could you give me a reminder notice a half hour before my plane is supposed to leave?"
متن کاملSurvey: Natural Language Parsing For Indian Languages
Syntactic parsing is a necessary task which is required for NLP applications including machine translation. It is a challenging task to develop a qualitative parser for morphological rich and agglutinative languages. Syntactic analysis is used to understand the grammatical structure of a natural language sentence. It outputs all the grammatical information of each word and its constituent. Also...
متن کاملTwitter Language Identification Of Similar Languages And Dialects Without Ground Truth
We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). Our method combines geolocation, original Twitter LID labels, and Amazon Mechanical Turk to resolve missing and unreliable labels. We are the first to compare LID classification performance using the MIRA algorithm and langid.py. We show classifier performance on di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Natural Language Engineering
سال: 2020
ISSN: 1351-3249,1469-8110
DOI: 10.1017/s1351324920000492